Recovery of Empty Nodes in Parse Structures
نویسندگان
چکیده
In this paper, we describe a new algorithm for recovering WH-trace empty nodes. Our approach combines a set of hand-written patterns together with a probabilistic model. Because the patterns heavily utilize regular expressions, the pertinent tree structures are covered using a limited number of patterns. The probabilistic model is essentially a probabilistic context-free grammar (PCFG) approach with the patterns acting as the terminals in production rules. We evaluate the algorithm’s performance on gold trees and parser output using three different metrics. Our method compares favorably with state-of-the-art algorithms that recover WH-traces.
منابع مشابه
Parsing and Empty Nodes
This paper describes a method for ensuring the termination of parsers using grammars that freely posit empty nodes. The basic idea is that each empty no& must be associated with a lexical item appearing in the input string, called its sponsor. A lexical item, as well as labeling the no&for the corresponding word, provides labels for a fixed number, possibly zero, of empty nodes. The number of n...
متن کاملA Simple Pattern-matching Algorithm for Recovering Empty Nodes and their Antecedents
This paper describes a simple patternmatching algorithm for recovering empty nodes and identifying their co-indexed antecedents in phrase structure trees that do not contain this information. The patterns are minimal connected tree fragments containing an empty node and all other nodes co-indexed with it. This paper also proposes an evaluation procedure for empty node recovery procedures which ...
متن کاملA Common Parsing Scheme for Left- and Right-Branching Languages
This paper presents some results of an attempt to develop a common parsing scheme that works systematically and realistically for typologically varied natural languages. The scheme is bottom-up, and the parser scans the input text from left to right. However, unlike the standard LR(k) parser or Tomita's extended LR(1) parser, the one presented in this paper is not a pushdown automaton based on ...
متن کاملEffects of Empty Categories on Machine Translation
We examine effects that empty categories have on machine translation. Empty categories are elements in parse trees that lack corresponding overt surface forms (words) such as dropped pronouns and markers for control constructions. We start by training machine translation systems with manually inserted empty elements. We find that inclusion of some empty categories in training data improves the ...
متن کاملTrace Prediction and Recovery with Unlexicalized PCFGs and Slash Features
This paper describes a parser which generates parse trees with empty elements in which traces and fillers are co-indexed. The parser is an unlexicalized PCFG parser which is guaranteed to return the most probable parse. The grammar is extracted from a version of the PENN treebank which was automatically annotated with features in the style of Klein and Manning (2003). The annotation includes GP...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007